Heuristic Word Alignment with Parallel Phrases
نویسنده
چکیده
This paper presents a method for word alignment that uses parallel phrases from manually word aligned sentence pairs to align words in new texts. Experiments on an English–Swedish parallel corpus showed that the heuristic phrase-based method produced word alignments with high precision. Furthermore, alignment recall was improved by generalizing phrases with part-of-speech categories. We also compared the phrase-based method to statistical word alignment and found that a combination of phrase-based and statistical word alignments outperformed pure statistical alignment in terms of Alignment Error Rate (AER).
منابع مشابه
High-precision Word Alignment with Parallel Phrases
We present a method to re-use manual word alignments for alignment of new corpora.
متن کاملAutomatic Phrase Alignment Using statistical n-gram alignment for syntactic phrase alignment
A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated (i.e. parallel) documents. These parallel sentences are linked through alignment. Much work has been done on sentence and word alignment, but not as much on the intermediate level. This paper explores using n-gram alignment created for statistical machine translation based on GIZA++...
متن کاملImproving Phrase Extraction via MBR Phrase Scoring and Pruning
One of the major reasons for translation errors in phrase-based SMT systems is the incorrect phrases induced from inaccuracy word-aligned parallel data. In this paper, we propose a novel approach that uses the minimum Bayes-risk (MBR) principle to improve the accuracy of phrase extraction. Our approach performs as a four-stage pipeline: first, bilingual phrases are extracted from parallel corpu...
متن کاملAn Integrated Tool for Translation-Memory Maintenance
This paper presents an integrated tool to construct and maintain translation-memory for memory-based machine translation. This tool was aimed to automate constructing and validating translation-memory both in word and in phrase levels from English-Thai parallel texts. To align English-Thai words and phrases, the crucial problems that must be resolved include multiple-word-expression boundary am...
متن کاملSequence segmentation for statistical machine translation
In the last decade, while statistical machine translation has advanced significantly, there is still much room for further improvements relating to many natural language processing tasks such as word segmentation, word alignment and parsing. Human language is composed of sequences of meaningful units. These sequences can be words, phrases, sentences or even articles serving as basic elements in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010